January 10, 2018

Geospatial Data in R

Workshop Goals

  • Intro to geospatial data

  • Intro to coordinate reference systems (CRS)

  • Classes & methods for working with spatial data in R

  • Mapping geospatial data

  • Practice

About Me

About you

Who are you?

Why are you here?

Before we begin

Geographic Data

Geographic Data

Data about locations on or near the surface of the Earth.

Place names

Convey geographic information but don't specify location on the surface of the Earth.

Geospatial data

represent location geometrically with coordinates

46.130479, -117.134167

Coordinate Reference System

Coordinates indicate specific locations on the Earth when associated with a geographic coordinate reference system or CRS.

Geographic CRS

Specifies

  1. the shape of the Earth as the major & minor axes of an ellipsoid
  2. the origin (0,0) - equator (~X), prime meridian (~Y)
  3. fit of the CRS to the Earth (center of the earth), aka Datum
  4. units, eg lat/lon expressed as decimal degrees or DMS

Because of variations in 1-3, there are many geographic CRSs!

The World Geodetic System of 1984 is the default geographic CRS used today.

Map Projections

A Projected CRS applies a map projection to a Geographic CRS

Map Projection: mathematial transformation from curved to flat surface

Projected CRSs

There are many, many, many projected CRSs

All introduce distortion, eg in shape, area, distance, direction

No best one for all purposes

Selection depends on location, extent and purpose

Different CRSs

Spatial Data

Spatial data is a more generic term that is not just for geographic data.

Spatial data are powerful because

  • dynamically determine spatial metrics like area and length,
  • characteristics like distance and direction, and
  • relationships like inside and intersects from these data.

Types of Spatial Data

Vector Data

Points, lines and Polygons

Raster Data

Regular grid of cells (or pixels)

  • We won't be covering Raster data in this workshop*

Softare for working with Geospatial Data

  • Why special software?

Software for working with Geospatial Data

Most software can only query and analyze numbers and text

  • Databases, spreadsheets, statistical software packages

Geospatial data require software that can import, create, store, edit, visualize and analyze locations represented geometrically as coordinates referenced to the surface of the earth.

GIS - Geographic Information Systems

Software that can import, create, store, edit, visualize and analyze locations represented geometrically as coordinates georeferenced to the surface of the earth.

A geospatial database of information collected for a particular purpose, usually to answer specific questions.

Types of GIS

Desktop GIS - ArcGIS, QGIS

Web GIS - ArcGIS Online CARTO

Spatial Databases - Postgresql

Custom software - leaflet web maps

Why R for Geospatial Data?

Why R for Geospatial Data?

You already use R

Reproducibility

Free & Open Source

Cutting edge

Thousands of Cool add-ons

  • Shiny, rleaflet

Geospatial Data Files

Geospatial Data Files

Geodata can be stored in many different types of files.

The simplest is point data store in a CSV file.

  • Comma Separated Values

CSV file

"ID","name”,”x”,”y”,”taste","price","crowded","food"
"1","babette",-122.255374,37.868428,10,4,1,1
"2","musical",-122.260698,37.868383,7,3.25,1,1
"3","starbucks",-122.266057,37.870441,6,2.95,1,0
"4","yalis",-122.266385,37.873528,7,2.95,0,0
"5","berkeleyesp",-122.268681,37.873664,3,3.25,1,0
"6","fertile",-122.268863,37.874934,5,3.25,0,1
"7","guerilla",-122.269206,37.877949,9,3.5,1,1
"8","philz",-122.269221,37.878093,10,4,1,0
"9","nefeli",-122.2603,37.875417,6,3.2,0,1
"10","goldenbear",-122.259613,37.869632,1,2,1,0

Geospatial Data in R

Geospatial Data in R

There are many approaches to and packages for working with geospatial data in R.

One approach is to keep it simple and store geospatial data in a data frame

Sample Data

sfhomes <- read.csv('data/sf_properties.csv')  
head(sfhomes,6)
##   FiscalYear SalesDate                              Address YearBuilt
## 1       2015      <NA> 0000 0019 RIVERTON            DR0000      1949
## 2       2015      <NA> 0000 1335 MASONIC             AV0000      1900
## 3       2015      <NA> 0000 1256 28TH                AV0000      1941
## 4       2015      <NA> 0000 0088 ENTRADA             CT0000      1927
## 5       2015      <NA> 0000 0090 MONTCALM            ST0000      1938
## 6       2015      <NA> 0000 0025 GOLETA              AV0000      1936
##   NumBedrooms NumBathrooms NumRooms NumStories NumUnits AreaSquareFeet
## 1           0            0        6          1        1           1455
## 2           0            0        8          2        1           2190
## 3           0            0        7          1        1           2040
## 4           0            0        8          2        1           2413
## 5           0            0        6          1        1           1770
## 6           0            0        6          1        1           2260
##   ImprovementValue LandValue       Neighborhood
## 1            66386     33617    Sunset/Parkside
## 2            61811     38224     Haight Ashbury
## 3            64357     35679    Sunset/Parkside
## 4            62217     37827 West of Twin Peaks
## 5            74076     25975     Bernal Heights
## 6            64182     35870    Sunset/Parkside
##                                Location SupeDistrict totvalue SalesYear
## 1 (37.7336426217969, -122.487523696156)            7   100003        NA
## 2 (37.7685762370234, -122.445265935163)            5   100035        NA
## 3 (37.7639276775532, -122.486496732295)            4   100036        NA
## 4 (37.7240546670353, -122.468263348608)            7   100044        NA
## 5 (37.7467909269629, -122.406602489544)            9   100051        NA
## 6 (37.7350415121305, -122.483500474243)            4   100052        NA
##        lat       lon
## 1 37.73364 -122.4875
## 2 37.76858 -122.4453
## 3 37.76393 -122.4865
## 4 37.72405 -122.4683
## 5 37.74679 -122.4066
## 6 37.73504 -122.4835

Plot of points

plot(sfhomes$lon, sfhomes$lat)

Nice Maps with ggplot2

library(ggplot2)

ggplot() + geom_point(data=sfhomes, aes(lon,lat), col="red", size=1)

Subset on 2015 sales and plot by Property Value (totvalue)

d2015 <- subset(d1, as.numeric(SalesYear) == 2015)
nrow(d2015)
hist(d1$totvalue)
ggplot() + geom_point(data=d2015, aes(x=lon, y=lat, col=totvalue))

ggmap extends ggplot

sf_map <- get_map("San Francisco, CA")

ggmap(sf_map) +
  geom_point(data=d2015, aes(x=lon, y=lat, col=totvalue))

Customize get_map

sf_ctr <- c(lon = mean(d1$lon), lat = mean(d1$lat))
sf_ctr  # take a look
sf_basemap <- get_map(sf_ctr, zoom=12, scale=1)

ggmap(sf_basemap) +
  geom_point(data=d2015, aes(x=lon, y=lat, col=totvalue))

Change the basemap

# ?get_map to see options
sf_basemap_lite <- get_map(sf_ctr, zoom=12, scale=1, 
                            maptype = "toner-lite", source="stamen")

ggmap(sf_basemap_lite) +
  geom_point(data=d2015, aes(x=lon, y=lat, col=totvalue))

Facet -nating

# Let's look at last 5 years
d2010_16 <- subset(d1, as.numeric(SalesYear) > 2005)

ggmap(sf_basemap_lite) +
  geom_point(aes(lon, lat, col=totvalue), data = d2010_16 )  +
  facet_wrap(~ SalesYear)

Challenge

Redo above facet map with the following changes:

  • Use data from 1995 - 1999
  • Use a different basemap (eg maptype="terrain")

GGMap and GGPlot are great but…

BUT!

There are limits to what you can do with geospatial data stored in a dataframe

and mapping the data with ggplot and ggmap

Shapefiles

The most common format for non-point geospatial data.

Can't directly read or plot a shapefile with data frames/ggplot/ggmap

Can't transform Coordinate Data

Spatial Data Objects in R

Why we like spatial objects

  • Data cleaning! Before Mapping

  • Better management of complex spatial representations
    • CRS tranformations and geoprocessing
  • Compute spatial metrics and relationships

  • Data linkages - when data are in same CRS

sp Package

The sp Package

Classes and Methods for Spatial Data

The SP package is most commonly used to construct and manipulate spatial data objects in R.

Hundreds of other R packages that do things with spatial data typically build on SP objects.

sf package

The sf, or simple features package in R has many improvements

Based on open standards for specifying spatial data

But most spatial packages still depend on sp

So, live on the bleeding edge or check back in a year or so.

sp package

library(sp)
getClass("Spatial") # See all sp object types
## Class "Spatial" [package "sp"]
## 
## Slots:
##                               
## Name:         bbox proj4string
## Class:      matrix         CRS
## 
## Known Subclasses: 
## Class "SpatialPoints", directly
## Class "SpatialMultiPoints", directly
## Class "SpatialGrid", directly
## Class "SpatialLines", directly
## Class "SpatialPolygons", directly
## Class "SpatialPointsDataFrame", by class "SpatialPoints", distance 2
## Class "SpatialPixels", by class "SpatialPoints", distance 2
## Class "SpatialMultiPointsDataFrame", by class "SpatialMultiPoints", distance 2
## Class "SpatialGridDataFrame", by class "SpatialGrid", distance 2
## Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2
## Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3
## Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2

```

sp Vector Objects

Geometry Spatial Object Spatial Object with Attributes
Points SpatialPoints SpatialPointsDataFrame
Lines SpatialLines SpatialLinesDataFrame<
Polygons SpatialPolygons SpatialPolygonsDataFrame

We use the S*DF objects most frequently!

Let's take a look

Read in point Data

Airbnb Rental data from http://insideairbnb.com

  • subsetted and used only for demonstration purposes
rentals <- read.csv('data/sf_airbnb_2bds.csv')
class(rentals)
## [1] "data.frame"
dim(rentals)
## [1] 1206   14

Examine Data Structure

str(rentals)
## 'data.frame':    1206 obs. of  14 variables:
##  $ id                  : int  91957 18139835 172943 2309140 2559297 149108 10832295 15134083 283235 8013423 ...
##  $ name                : Factor w/ 1203 levels " Victorian Suite in Inner Mission",..: 1091 267 917 1128 479 904 349 696 1064 976 ...
##  $ latitude            : num  37.8 37.8 37.8 37.8 37.8 ...
##  $ longitude           : num  -122 -122 -122 -122 -122 ...
##  $ property_type       : Factor w/ 4 levels "Apartment","Condominium",..: 1 1 3 1 1 3 2 1 1 1 ...
##  $ room_type           : Factor w/ 1 level "Entire home/apt": 1 1 1 1 1 1 1 1 1 1 ...
##  $ accommodates        : int  6 4 3 4 4 3 6 4 4 4 ...
##  $ bathrooms           : num  1.5 1 1 1 1 1 1 1 1 2 ...
##  $ bedrooms            : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ beds                : int  4 2 2 2 2 2 2 2 2 2 ...
##  $ price               : int  300 275 279 395 235 300 450 349 199 235 ...
##  $ review_scores_rating: int  94 100 100 97 86 100 100 92 96 93 ...
##  $ neighbourhood       : Factor w/ 52 levels "Alamo Square",..: 7 52 7 52 52 7 52 26 52 20 ...
##  $ listing_url         : Factor w/ 1206 levels "https://www.airbnb.com/rooms/1003756",..: 1156 490 446 666 689 293 55 311 711 1073 ...

Examine Data Content

head(rentals)
##         id                                      name latitude longitude
## 1    91957                    Sunny SF Flat with 5*s 37.76194 -122.4493
## 2 18139835 Brand New Apartment in the Heart of NOPA! 37.77568 -122.4453
## 3   172943        Redwood Cottage on Ashbury Heights 37.76022 -122.4495
## 4  2309140                               The Lady Di 37.77524 -122.4394
## 5  2559297       Entire 2BR w/pkg centrally-located! 37.77507 -122.4409
## 6   149108          QUIET LUXURY COLE VALLEY PARKING 37.76101 -122.4507
##   property_type       room_type accommodates bathrooms bedrooms beds price
## 1     Apartment Entire home/apt            6       1.5        2    4   300
## 2     Apartment Entire home/apt            4       1.0        2    2   275
## 3         House Entire home/apt            3       1.0        2    2   279
## 4     Apartment Entire home/apt            4       1.0        2    2   395
## 5     Apartment Entire home/apt            4       1.0        2    2   235
## 6         House Entire home/apt            3       1.0        2    2   300
##   review_scores_rating         neighbourhood
## 1                   94           Cole Valley
## 2                  100 Western Addition/NOPA
## 3                  100           Cole Valley
## 4                   97 Western Addition/NOPA
## 5                   86 Western Addition/NOPA
## 6                  100           Cole Valley
##                             listing_url
## 1    https://www.airbnb.com/rooms/91957
## 2 https://www.airbnb.com/rooms/18139835
## 3   https://www.airbnb.com/rooms/172943
## 4  https://www.airbnb.com/rooms/2309140
## 5  https://www.airbnb.com/rooms/2559297
## 6   https://www.airbnb.com/rooms/149108

Visualize Data

hist(rentals$price)

Process Data

cheap <- subset(rentals, price < 401)
hist(cheap$price)

Process Data Some More

cheap_good <- subset(cheap, review_scores_rating > 98)
hist(cheap$price)

From Data Frame to SpatialPointsDataFrame

Create a SpatialPointsDataFrame

Use the sp::coordinates() method

Requires a vector indicating the x,y columns

SP will create a SpatialPointsData Fram from csv

SpatialPointsDataFrame (SPDF)

#First make a copy
cheap_good_orig <- cheap_good

coordinates(cheap_good) <- c('longitude','latitude')
class(cheap_good)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

Compare SPDF to DF

str(cheap_good_orig)
## 'data.frame':    374 obs. of  14 variables:
##  $ id                  : int  18139835 172943 149108 10720392 14594885 15546103 13516315 10058568 1721354 3585034 ...
##  $ name                : Factor w/ 1203 levels " Victorian Suite in Inner Mission",..: 267 917 904 504 448 323 948 314 519 389 ...
##  $ latitude            : num  37.8 37.8 37.8 37.8 37.8 ...
##  $ longitude           : num  -122 -122 -122 -122 -122 ...
##  $ property_type       : Factor w/ 4 levels "Apartment","Condominium",..: 1 3 3 1 2 3 1 1 3 1 ...
##  $ room_type           : Factor w/ 1 level "Entire home/apt": 1 1 1 1 1 1 1 1 1 1 ...
##  $ accommodates        : int  4 3 3 6 4 4 4 4 4 4 ...
##  $ bathrooms           : num  1 1 1 1.5 2 1 2 2 1 1.5 ...
##  $ bedrooms            : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ beds                : int  2 2 2 2 2 2 3 3 2 2 ...
##  $ price               : int  275 279 300 250 200 225 350 215 389 200 ...
##  $ review_scores_rating: int  100 100 100 100 99 100 100 100 100 100 ...
##  $ neighbourhood       : Factor w/ 52 levels "Alamo Square",..: 52 7 7 20 52 20 7 7 20 7 ...
##  $ listing_url         : Factor w/ 1206 levels "https://www.airbnb.com/rooms/1003756",..: 490 446 293 41 268 336 200 3 439 776 ...

SPDF

str(cheap_good)

SPDF Slots

You can see from str(cheap_good) that a SPDF object is a collection of slots or components. The key ones are:

  • @data data frame of attributes that describe each location
  • @coords the coordinates for each location
  • @bbox the min and max bounding coordinates
  • @proj4string the coordinate reference system defintion as a string

SPDF Slots

Review the output of each of these:

summary(cheap_good)
head(cheap_good@coords)
head(cheap_good@data)
cheap_good@bbox
cheap_good@proj4string

What's missing

Are all the columns that were present in the DF now in the SPDF?

Is there a slot without data?

What is the CRS of the data?

cheap_good@proj4string # get a CRS object
## CRS arguments: NA

Define a CRS

Defining the CRS of a spatial data object enables the software to locate the coordinates in a specific space.

Geospatial-aware software will have definitions for supported Earth referenced coordinate systems

For example…

To define a CRS

You need to know

  • the specific CRS for the data
  • how to define the CRS object
  • how to assign it to the spatial data

Define & Assign a CRS

Known as defining a projection in ArcGIS

Defining a CRS != Transforming CRS

  • Defining does nothing to the coordinates themselves

We will get to transformations soon!

CRS Objects

Need the proj4 string or code that defines the CRS

proj4 is a widely used library of CRS definitions and functions.

Let's do this

sp includes the CRS() function for creating a CRS object

  • See ?CRS for details
# Create a CRS object from its proj4 string
WGS84_CRS <- CRS("+proj=longlat +datum=WGS84 +no_defs 
                    +ellps=WGS84 +towgs84=0,0,0") 

# Set the CRS of the SPDF
proj4string(cheap_good) <- WGS84_CRS

# check it
cheap_good@proj4string
## CRS arguments:
##  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

Two other ways to define the CRS

# or use the EPSG code directly
proj4string(cheap_good) <- CRS("+init=epsg:4326") 

# or enter the string
proj4string(cheap_good) <- CRS("+proj=longlat 
                               +ellps=WGS84 +datum=WGS84 +no_defs")  

Define Incorrectly?

What happens if we assign the wrong CRS?

or not at all?

Software doesn't care but you need to!

Make sure it is defined correctly

# 4326 is the code for WGS84
proj4string(cheap_good) <- CRS("+init=epsg:4326") 

Finding CRS Codes

See http://spatialreference.org/

Use this site to find EPSG codes and proj4 CRS strings

Because I can't remember a CRS string but I can remember ~10 codes

Common CRSs

Geographic

  • 4326 Geographic, WGS84 (default for lon/lat)

  • 4269 Geographic, NAD83 (USA Fed agencies like Census)

Projected

  • 5070 USA Contiguous Albers Equal Area Conic

  • 3310 CA ALbers Equal Area

  • 26910 UTM Zone 10, NAD83 (Northern Cal)

  • 3857 Web Mercator (web maps)

Challenge

Use http://spatialreference.org/ to make an educated guess as to the CRS of these coordinates:

X = 549228.058249, Y = 4176578.444299

Strategy:

  • review the bounding box coordinates for the CRSs referenced by the above codes.

Challenge 2

What are the units for that CRS?

Mapping Spatial Objects

plot

You can use the standard R plot command to make a basic map of one or more sp objects.

plot(cheap_good)

plot options

You can enhance the basic plot with custom graphical parameters.

See: see the Quick-R website's page for graphical parameters.

plot(cheap_good, col="red", bg="lightblue", pch=21, cex=2)

spplot

sp includes a plotting method spplot

Use it to create quick maps

Or to create great maps

But complex, non-intuitive syntax = long ugly code

See examples in sp Gallery: Plotting maps with sp

spplot of the rental data by price

spplot(cheap_good,"price")

Challenge

Use spplot to create data maps from some of the other columns in the @data slot

Getting help:

?spplot

Examples

spplot(cheap_good,"bathrooms")

spplot(cheap_good,"accommodates")

spplot(cheap_good,"property_type")

spplot(cheap_good,"neighbourhood")

Spatial Data File formats

Common Spatial Data File formats

ESRI Shapefile

This is one of the most, if not the most common spatial vector data file formats.

Old but everywhere!

Gotchas: 2GB limit, 10 char column names, poor unicode support

Reading in Geospatial Data

There's an R package for that!

rgdal

rgdal is an R port of the powerful and widely used GDAL library.

It is the most commonly used R library for importing and exporting spatial data.

  • OGR: for vector data: readOGR() and writeOGR()

  • GDAL for raster data: readGDAL() and writeGDAL()

rgdal

library(rgdal)
## Warning: package 'rgdal' was built under R version 3.4.3
## rgdal: version: 1.2-16, (SVN revision 701)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-5
# See what file types are supported by rgdal drivers
# ogrDrivers()$name

Getting help

gdal.org

`?readOGR

For more info on working with rgdal to load different types of spatial data in R see this excellent tutorial by Zev Ross.

Read in Shapefile

sfboundary <- readOGR(dsn="data",layer="sf_boundary")
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "sf_boundary"
## with 1 features
## It has 3 fields

Read in Shapefile

sfboundary <- readOGR(dsn="data",layer="sf_boundary")
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "sf_boundary"
## with 1 features
## It has 3 fields
# or
# sfboundary <- readOGR("data","sf_boundary")
# but not
#sfboundary <- readOGR(dsn="data/",layer="sf_boundary")

Check out the data structure

str(sfboundary)  
summary(sfboundary)

Make a quick plot to check the data

How?

Make a quick plot to check the data

plot(sfboundary)

Take a look at the attribute data

How?

Take a look at the attribute data

head(sfboundary@data)   
##            NAME Pop2010 Land_sqmi
## 0 San Francisco  805235     46.87

Take a look at the CRS info

How?

Take a look at the CRS info

Is it defined? Geographic or projected CRS?

sfboundary@proj4string
## CRS arguments:
##  +proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80
## +towgs84=0,0,0

Plot with Rental Data

plot(sfboundary)
points(cheap_good, col="red")

Plot with Rentals

Where are the points?

plot(sfboundary)
points(cheap_good, col="red")

What's Wrong?

Compare the CRSs, are they the same?

proj4string(sfboundary)
proj4string(cheap_good)
proj4string(sfboundary) == proj4string(cheap_good)

Compare the CRSs, are they the same?

proj4string(sfboundary)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(cheap_good)
## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
proj4string(sfboundary) == proj4string(cheap_good)
## [1] FALSE

Compare the coord data

sfboundary@bbox
##         min       max
## x  542696.6  556659.9
## y 4173563.7 4185088.6
cheap_good@bbox
##                  min        max
## longitude -122.50647 -122.38947
## latitude    37.70738   37.80646

CRS Problems

The #1 reason…

CRS Transformations

All geospatial data should have the same CRS.

When they don't, they need to be transformed to a common CRS.

  • especially for spatial analysis

This is also called a projection transformation,

  • or projecting or reprojection.

Transform the CRS

Use sp function spTransform

Requires as input:

  • a spatial object to transform with a defined CRS
    • Input spatial object must have a defined CRS
  • a CRS object that indidicates the target CRS

Outputs a new spatial object with coordinate data in the target CRS

  • It does not transform the source data!

Transform sfboundary

sf_lonlat <- spTransform(sfboundary, WGS84_CRS)

Did it work?

How will we know?

Do the CRSs match?

proj4string(cheap_good) == proj4string(sf_lonlat)

Overlay the data in space

plot(sf_lonlat)
points(cheap_good, col="red")
points(cheap_good[cheap_good$price<100,], col="green", pch=19)

spTransform 4 Ways

Use CRS to that of another data layer

sf_lonlat <- spTransform(sfboundary, CRS(proj4string(cheap_good)))

Use CRS string

sf_lonlat <- spTransform(sfboundary, CRS("+proj=longlat +ellps=WGS84 
    +datum=WGS84 +no_defs"))

USE CRS code

sf_lonlat <- spTransform(sfboundary, CRS("+init=epsg:4326"))

Use a CRS object

WGS84_CRS <- CRS(proj4string(cheap_good))
sf_lonlat <- spTransform(sfboundary, WGS84_CRS)

Save the shapefile

Use writeOGR to save sf_lonlat to a new shapefile

See ?writeOGR for help

# write transformed data to a new shapefile 
writeOGR(sf_lonlat, dsn = "data", layer =  "sf_bounary_geo", 
          driver="ESRI Shapefile")

# is it there?
dir("data")

Projections, CRS, oh my!

We want all data in the same CRS

Which one is best?

Line Data

Line Data

Let's read in some line data

In the popular GeoJSON file format

Reading in GeoJSON File

sf_streets <- readOGR(dsn='data/sf_highways.geojson', layer="OGRGeoJSON")
## OGR data source with driver: GeoJSON 
## Source: "data/sf_highways.geojson", layer: "OGRGeoJSON"
## with 246 features
## It has 5 fields

GeoJSON

Plot the data

plot(sf_streets)

Examine the data

str(sf_streets)
summary(sf_streets)
  • What is the CRS of the data?
  • Are the data projected?
  • Does the CRS match that of the other layers?

Challenge

Plot all the data on one map

How do we do that?

Recall our earlier examples.

Map all the data

plot(sf_lonlat)
lines(sf_streets)
points(cheap_good, col="red")
points(cheap_good[cheap_good$price<200,], col="green")

Order Matters

Layers draw in order added to plot.

The GIS Layer Cake

Should be: Rasters > Polygons > Lines > Points

Order can also matter for Features!

Disorder - where are the points?

plot(cheap_good, col="red")
lines(sf_streets)
plot(sf_lonlat, add=TRUE)  # note syntax

RECAP

  • Read & write data in CSV, Shapefile and GeoJSON formats
    • coordinates() - data frame to SPDF
    • rgdal::readOGR and writeOGR
  • Created Spatial*DataFrames

  • Defined CRS with proj4string()

  • Transformed CRS with spTransform()

  • Mapped data with plot() and spplot

  • Mapped multiple geospatial data layers

  • Terminology

QUESTIONS?

Data Driven Maps

Data Driven Maps

Also called thematic maps or data maps

Use data values to determine symbology

Use symbology to convey data values / meaning

Important for all phases of the research process, from exploratory analysis through reporting results.

R packages for Mapping

Lots of them

Let's quickly discuss the most common ones

plot

Pros

  • Quick basic maps
  • Easy to map multiple layers
  • Works with sp objects

Cons

  • Requires lots of complex code to make pretty

spplot

Pros

  • Quick thematic maps of one layer
  • More config options for sp objects than plot

Cons

  • Pretty maps require lots of complex code

ggplot2 and ggmap

Pros

  • beautiful maps with ggplot2
  • builds on existing R knowledge of ggplot2
  • Some great geospatial data functionality in ggmap
  • works with data frames

Cons

  • Doesn't work with sp objects
  • Limited support for CRS and spatial operations
    • Expects geographic coordinates (longitude and latitude) as input
  • Complex syntax and methods

tmap

Pros

  • Quick and powerful maps
  • Works with sp objects & CRSs
  • Syntax similar to but not as complex as ggplot2
  • Easy to save as static or interactive tmaps

Cons

  • No basemaps in static mode

leaflet

Pros

  • R port of the popular Javascript library

  • Allows one to create interactive maps that can be served on web

  • Highly customizeable!

Cons

  • Highly customizeable!

tmap

tmap

tmap stands for thematic map

Great maps with less code than the alternatives

Syntax should be familar to ggplot2 users, but simpler

Good starting points

tmap

Load the library

library(tmap)
## Warning: package 'tmap' was built under R version 3.4.3

US Population by State

Let's explore population data for US states.

Data are in the file data/us_states_pop.shp

Challenge

Use readOGR to load it and plot to view it

Read in & plot the Data

uspop <- readOGR("./data", "us_states_pop")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "us_states_pop"
## with 49 features
## It has 4 fields
plot(uspop)

Basic Questions

Review the Data with commands we used earlier

  • What type of data object is usopen

  • How many features does it contain?

  • How many attributes describe those features?

  • What is the CRS?

Quick tmap - qtm

Make a quick plot with default options

qtm(uspop)

qtm of a data value

qtm(uspop,"POPULATION")

tmap Shapes and Graphic Elements

tmap's flexibility comes in how it intitively allows you to layer spatial data and style the layers by data attributes

Use tm_shape(<sp_object>) to specifiy a geospatial data layer

Add + tm_<element>(...) to style the layer by data values

…and other options for creating a publication ready map

Exploring tmap functionality

?tm_polygons

Customizing the display

tm_shape(uspop) + tm_polygons(col="beige", border.col = "red")

or

tm_shape(uspop) + tm_polygons(col="#f6e6a2", border.col = "white")

Challenge

Create a tmap of Population with different fill and outline colors

You can use color names or HEX values

See ?tm_polygons and try customizing another parameter

My Map

tm_shape(uspop) + tm_polygons(col="#f6e6a2", border.col = "black", alpha=0.5, lwd=0.5)

CRS and Shape

Notice anything odd about shape of USA?

tm_shape(uspop) + tm_polygons(col="#f6e6a2", border.col = "white")

Projected CRSs for Mapping

Setting the CRS Dynamically

tm_shape(uspop, projection="+init=epsg:5070") + tm_polygons(col="#f6e6a2", border.col = "white")

Dynamic CRS Transformations

Also called On-the-fly reprojection in ArcGIS & QGIS

Very cool!

BUT, if you want to use data in a different CRS it is best to transform it.

Challenge

  • Transform the uspop data to the USA Contiguous Albers CRS (5070),

  • Save output as a new SpatialPolygonsDataFrame called uspop_5070

uspop_5070

uspop_5070 <- spTransform(uspop, CRS("+init=epsg:5070"))

Plotting the Transformed Data

tm_shape(uspop_5070) + tm_polygons(col="#f6e6a2", border.col = "white")

What's happening here?

tm_shape(uspop_5070) + tm_polygons(col="#f6e6a2", border.col="#f6e6a2") +
tm_shape(uspop) + tm_borders(col="purple")

Choropleth Maps

Choropleth maps

Color areas by data values

Fun name for a type of data map

Sometimes called heatmap

Choropleth Map of State Population

tm_shape(uspop_5070) + tm_polygons(col="POPULATION")

Keys to Great Choropleth Maps

  1. Color Palettes

  2. Data Classification

Color

rColorBrewer Package

The R package RColorBrewer is widely used to select color palettes for maps.

Read the package help for more info.

?RColorBrewer or see colorbrewer.org

library(RColorBrewer)

Exploring Brewer Palettes

Use display.brewer functions to see available color palettes.

display.brewer.all()

Qualitative Color Palettes

Complementary colors, e.g. pastels, to emphasize different categories not magnitudes or order.

display.brewer.all(type="qual")

Sequential Color Palettes

Light to dark shades of the same or complimentary colors to imply order, e.g. higher to lower ranks or values

display.brewer.all(type="seq")

Divergent Color Palettes

Two contrasting sequential color palettes at either end to emphasize outliers with a lighter palette in the middle to reveal trends.

display.brewer.all(type="div")

Applying a Color Palette

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", palette="BuPu")

Challenge

Try a sequential, divergent and qualitative color palette with this dataset.

Set auto.palette.mapping=FALSE and TRUE with the SPECTRAL palette.

  • You can read about this option in ?tm_polygons

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", palette="Spectral", 
                                auto.palette.mapping=FALSE)

Data classification

Why Data classification

Unclassified maps scale full range of values to color palette

  • Great for exploring trends and outliers,

  • but hard to interpret specific data values.

  • The eye can only differentiate a few colors.

Data classification

Important for improving the cartographic display of continuous data

Reduces complexity by grouping continuous values into a small number of bins, typically 5-7.

Unique symbology - color - is then associated with each bin.

A legend indicates the association, making it easier to interpret data values.

Data Classification Methods

Common methods for binning data into a set of classes include:

  • equal interval: classes the data into bins of equal data ranges, e.g. 10-20, 20-30, 30-40.

  • quantile: classes the data into bins with an equal number of data observations. This is the default for most mapping software.

  • fisher/jenks/natural breaks: classes data into bins that minmizie within group variance and maximize between group variance.

  • standard devation: classes emphasize outliers by classing data into bins based on standard deviations around the mean, eg -2 to + 2 SDs.

Data classification in tmap

tmap uses the classificaiton methods in the classIntervals package

The class method is set by the style parameter in tm_polygons - or other style element, eg tm_symbols

See ?tm_polygons for keyword options

tmap data classification - Quantiles

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", style="quantile")

tmap data classification - Jenks

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", style="jenks")

Challenge

Try a different classification scheme.

Note how it impacts the display of the data and thus communicate different messages

What is the default classification scheme used by tmap?

Number of Classes

In addition to setting the classification method you can set the number of classes, or cuts.

By default this is 5

Explore setting it to 3 or 9 with n=

tm_shape(uspop_5070) + tm_polygons(col="POPULATION", palette="BuPu", 
                                    style="jenks", n=9)

Cartography

The Art and Science of Map Making

Science:

  • the data, the domain,
  • choice of classification scheme

Art:

  • communication & visualization theory
  • choice of color, shape, size, transparency, weight, context

Challenge

Map popdens instead of POPULATION

Don't use defaults

  • Set color palette and classification style

Faceting

tm_shape(uspop_5070) + tm_polygons(col=c("POPULATION","popdens"), 
                              style=c("jenks","jenks"))

Mapping Polygons

Sometimes polygons get in the way of what you want to communicate. Why?

What state has the greatest population density?

tm_shape(uspop_5070) + tm_symbols(col="popdens", style="jenks")

Polygons as Points

Use tm_symbols to map the data as points

tm_shape(uspop_5070) + tm_symbols(col="popdens", style="jenks")

Mapping multiple layers

tm_shape(uspop_5070) + tm_polygons(col="white", border.alpha = 0.5) + 
  tm_shape(uspop_5070) + tm_symbols(col="popdens", style="jenks")

Point Map Terminology

Graduated Color Map when data are classified

Proportional Color Map when they are not

Challenge

Change the marker shape to square

Symbol Maps

We have using color to communicate data values.

We can also use size.

Proportional and Graduated Symbol maps vary symbol size by data value

Graduated Symbol Maps

What changed to make a symbol not color map?

tm_shape(uspop_5070) + tm_polygons(col="white", border.alpha = 0.5) + 
  tm_shape(uspop_5070) + tm_symbols(size="popdens", style="jenks")

Challenge

Take a look at ?tm_symbols and adjust the graduated symbol map

Change the outline and fill colors of the point symbols and add fill transparency.

Also customize the legend title

Take 2

tm_shape(uspop_5070) + tm_polygons(col="white", border.alpha = 0.5) + 
  tm_shape(uspop_5070) + tm_symbols(size="popdens", style="sd", 
     size.lim=c(1,500), col="purple", alpha=0.5, border.col="grey", 
     title.size="Population Density (km2)")

Fancy tmap

tm_shape(uspop_5070 ) + tm_polygons("POPULATION") + 
    tm_scale_bar(position=c("left","bottom")) + 
    tm_compass(position=c("right","center")) + 
    tm_style_classic(title="Patty's Map", 
    title.position=c("center","top"), inner.margins=c(.05,.05, .15, .25))

Fancy tmap

SO… What state has highest pop density?

Interactive tmaps

Plot vs View Modes

tmap has two modes: plot and view

The plot mode is the default and is for creating static maps.

The view mode is for creating interactive maps using the leaflet Javascript library.

You can switch back and forth with the tmap_mode() function.

Same syntax for static and interactive tmaps!

View mode

tmap_mode("view")
 
tm_shape(uspop_5070) + tm_symbols(size="popdens", style="sd", 
    col="purple", border,col="black", alpha=0.5)

Add Popup Content

?tmap_mode

tmap_mode("view")

tm_shape(uspop_5070) + tm_symbols(size="popdens", style="sd", 
    col="purple", border.col="black", alpha=0.5, 
    popup.vars=c("NAME","POPULATION","popdens"))

Plot Mode

Return to plot mode

tmap_mode('plot')
## tmap mode set to plotting

tmap Choropleth Maps

Legend auto added to tmap interactive choropleth maps

mymap <- tm_shape(uspop_5070) + 
  tm_polygons("popdens", style="jenks", title="Population Density")

tm_view(mymap)

Big Challenge

San Francisco Data Map

Make a graduted color map of SF Airbnb rentals (cheap_good)

  • overlay the rentals on top of the boundary of sf and the sf streets

Make an interacive map

  • a popup that includes the name, price and another attribute or two

Solution 1. Static Map

tm_shape(sf_lonlat) + 
    tm_polygons(col="beige", border.col = "blue") + 
tm_shape(sf_streets) + 
    tm_lines(col="black", lwd = 3) +
tm_shape(sf_streets) + 
    tm_lines(col="white", lwd = 1) +
tm_shape(cheap_good) + 
    tm_symbols(col="red", size=0.5, alpha=0.5, style="jenks")

Solution 1. Static Map

Solution 2. Interactive Map

tmap_mode("view")

tm_shape(sf_lonlat) + 
    tm_polygons(col="beige", border.col = "blue") + 
tm_shape(sf_streets) + 
    tm_lines(col="black", lwd = 3) +
tm_shape(sf_streets) + 
    tm_lines(col="white", lwd = 1) +
tm_shape(cheap_good) + 
    tm_symbols(col="red", size = 0.5, alpha=0.5, style="jenks", 
    popup.vars=c("name","price","listing_url"))
    
tmap_mode("plot")

More Maps

Category Maps

Mapping categorical (or qualitative) data with

tm_shape(sfboundary) + 
  tm_polygons(col="beige") + 
tm_shape(cheap_good) + 
  tm_symbols(shape="property_type", size=0.5)

Category Maps

Multivariate

bigmap <- tm_shape(sfboundary) + tm_polygons(col="beige") + 
  tm_shape(cheap_good) + 
  tm_symbols(size="accommodates", title.size="Accomodates", col="price", 
      title.col="Price", shape="property_type", title.shape="Property Type") +
  tm_layout( legend.bg.color="white",inner.margins=c(.05,.05, .15, .25), 
      title="Airbnb 2 Bedroom Rentals, San Francisco Fall 2017", 
      legend.position=c("right","center"))
 
 bigmap

Multivariate

save_tmap

You can save static and interactive maps with tm_save

See: ?save_tmap for details

map1 <- tm_shape(uspop_5070) + tm_polygons(col="POPULATION", style="jenks")
save_tmap(map1, "tmap_choropleth.png", height=4) # Static image file
save_tmap(map1, "tmap_choropleth.html") # interactive web map

Sharing your interactive Map

  • Rpubs
  • Github
  • Web server
  • Or not, just serve locally or email

Doing more with tmap

leaflet Package

There are several R packages that output leaflet maps.

Use the leaflet package for more customized leaflet maps

Highly recommend if you want to make interactive maps.

See the RStudio Leaflet tutorial.

Simple leaflet map

library(leaflet)
leaflet(cheap_good) %>% addTiles() %>%
    addCircleMarkers(data = cheap_good, radius = 5, stroke=F,
    color = "purple", fillOpacity = 0.75
  )

Add palette to leaflet map

pal <- colorQuantile("Reds",NULL,5)
leaflet(cheap_good) %>% addTiles() %>%
    addCircleMarkers(
      data = cheap_good,
      radius = 6,
      color = ~pal(price),
      stroke = F, 
      fillOpacity = 0.75
  )

Add popup to leaflet map

popup_content <- cheap_good$name
popup_content <- paste0(popup_content, "<br>Price per night: $", cheap_good$price)
popup_content <- paste0(popup_content, "<br><a href=",cheap_good$listing_url,">More info...</a>")

leaflet(cheap_good) %>% addTiles() %>%
    addCircleMarkers(
      data = cheap_good,
      radius = 6,
      color = ~pal(price),
      stroke = F, 
      fillOpacity = 0.75,
      popup = popup_content)

Questions?

What we didn't cover

What we didn't cover

  • Raster data

  • Geoprocessing

  • Spatial Analysis

Next week, Part II

Teaser

What do you think this code does?

Think about it

Try it and see

Check the help ?spDist

coit_tower <- c("-122.405837,37.802032") 

cheap_good$coit_dist <- 
  spDistsN1(cheap_good,c(-122.405837,37.802032), longlat = T) 

head(cheap_good@data)

The future is sf

Selected References

Session Info

sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RColorBrewer_1.1-2 tmap_1.11          rgdal_1.2-16      
## [4] sp_1.2-5          
## 
## loaded via a namespace (and not attached):
##  [1] viridisLite_0.2.0  jsonlite_1.5       splines_3.4.1     
##  [4] geojsonlint_0.2.0  foreach_1.4.4      R.utils_2.6.0     
##  [7] gtools_3.5.0       shiny_1.0.5        expm_0.999-2      
## [10] stats4_3.4.1       yaml_2.1.16        LearnBayes_2.15   
## [13] backports_1.1.2    lattice_0.20-35    digest_0.6.13     
## [16] colorspace_1.3-2   htmltools_0.3.6    httpuv_1.3.5      
## [19] Matrix_1.2-12      R.oo_1.21.0        plyr_1.8.4        
## [22] XML_3.98-1.9       rmapshaper_0.3.0   raster_2.6-7      
## [25] gmodels_2.16.2     xtable_1.8-2       webshot_0.5.0     
## [28] scales_0.5.0       gdata_2.18.0       satellite_1.0.1   
## [31] gdalUtils_2.0.1.7  mapview_2.2.0      magrittr_1.5      
## [34] mime_0.5           deldir_0.1-14      evaluate_0.10.1   
## [37] R.methodsS3_1.7.1  nlme_3.1-131       MASS_7.3-48       
## [40] class_7.3-14       tools_3.4.1        geosphere_1.5-7   
## [43] stringr_1.2.0      V8_1.5             munsell_0.4.3     
## [46] compiler_3.4.1     e1071_1.6-8        units_0.4-6       
## [49] classInt_0.1-24    grid_3.4.1         tmaptools_1.2-2   
## [52] RCurl_1.95-4.10    dichromat_2.0-0    iterators_1.0.9   
## [55] htmlwidgets_0.9    crosstalk_1.0.0    bitops_1.0-6      
## [58] base64enc_0.1-3    rmarkdown_1.8      boot_1.3-20       
## [61] codetools_0.2-15   DBI_0.7            jsonvalidate_1.0.0
## [64] curl_3.1           R6_2.2.2           knitr_1.17        
## [67] udunits2_0.13      rgeos_0.3-26       rprojroot_1.3-1   
## [70] spdep_0.7-4        KernSmooth_2.23-15 stringi_1.1.6     
## [73] osmar_1.1-7        Rcpp_0.12.14       sf_0.5-5          
## [76] png_0.1-7          leaflet_1.1.0      spData_0.2.6.9    
## [79] coda_0.19-1

Output code to script

library(knitr)
purl("r-geospatial-workshop-f2017.Rmd", output = "scripts/r-geospatial-workshop-f2017.r", documentation = 1)